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ABSTRACT 

We use the first 25% of the DEEP2 Galaxy Redshift Survey spectroscopic data to identify groups 
and clusters of galaxies in redshift space. The data set contains 8370 galaxies with confirmed redshifts 
in the range 0.7 < z < 1.4, over one square degree on the sky. Groups are identified using an algorithm 
(the Voronoi-Dclaunay Method) that has been shown to accurately reproduce the statistics of groups 
in simulated DEEP2-like samples. We optimize this algorithm for the DEEP2 survey by applying it to 
realistic mock galaxy catalogs and assessing the results using a stringent set of criteria for measuring 
group-finding success, which we develop and describe in detail here. We find in particular that the 
group-finder can successfully identify ~ 78% of real groups and that ~ 79% of the galaxies that are 
true members of groups can be identified as such. Conversely, we estimate that ~ 55% of the groups 
we find can be definitively identified with real groups and that ~ 46% of the galaxies we place into 
groups are interloper field galaxies. Most importantly, we find that it is possible to measure the 
distribution of groups in redshift and velocity dispersion, n(a, z), to an accuracy limited by cosmic 
variance, for dispersions greater than 350 km s~F We anticipate that such measurements will allow 
strong constraints to be placed on the equation of state of the dark energy in the future. Finally, we 
present the first DEEP2 group catalog, which assigns 32% of the galaxies to 899 distinct groups with 
two or more members, 153 of which have velocity dispersions above 350 km s~F We provide locations, 
redshifts and properties for this high-dispersion subsample. This catalog represents the largest sample 
to date of spectroscopically detected groups at z ~ 1. 
Subject headings: Galaxies: high-redshift — galaxies: clusters: general 



1. INTRODUCTION 

Groups and clusters of galaxies are the most massive 
dynamically relaxed objects in the universe; as such, 
they have long been the subject of intense and fruitful 
study. More than seventy years ago observations of the 
Coma cluster gave the first evidence for the existence of 
dark matter ijZwickvl Il933) . More recently, studies of 
gravitational lensing by clusters have yielded intriguing 
new information abo ut the profiles of dark matter ha- 
los l)Sand et alJl2004|) . Identifying and studying galax- 
ies in groups and clusters is essential to understanding 
the effects of local enviro nment on galaxy format ion and 
evolution (Lor a review see lBower fc Baloghl2004[) . X-ray 
measurements of the gas mass fraction in clusters have 
been used to constrain the mass density parameter Q,m 
and more rece ntly the equation of state of the dark en- 
ergy, w (e.g., Alle n et alll2004 and references therein). 
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Finally, if we can accurately measure the abundance 
of groups and its evolution with redshift, we can con 
strain the growth of large-scale structure, thereby plac 
ing sig nificant further constraints on cosmological param 
eters (lLiIie1ll992t lEke et alJ 1199ft iBorgani et all 1199 ' 
Hai man et al. 1120011 iHolder et aLll200lHNewman et al 

A wide array of methods has been used to iden- 
tify groups and clusters at moderate redshifts: X- 
ray emission from hot intracluster gas (reviewed 
by iRosati. Borgani fc Normanl l2002|) . cosmic shear 
due to weak gravitational lensing (reviewed by 
iRefreeierl 120031). searches in optical photometri c 
data {e.g., iGonzalez etalll200ll Wee fc Gladdersll2002|) . 
the Sunyaev-Zel'dovich (SZ ) effect in the Cosmi c Mi- 
crowave Background (e.g., lLaRoaue et alJ [20031 . and 
direct reconstruction of three-dimens ional objects in 

H]|I)04). To study 



galaxy redshift surveys (e.g., lEke 
the evolution of the group abundance it is necessary to 
extend observations to more distant objects. However, 
most of the methods used for local studies have only 
limited effectiveness at high redshift. The apparent sur- 
face brightness of X-ray clusters dims as (1 + z)~ 4 , mak- 
ing only the richest clusters visible at high redshift . The 
cross-section for gravitational lensing falls rapidly at high 
redshifts, making weak-lensing detection of distant clus- 
ters difficult for all but the most massive objects. In 
photometric surveys, the increased depth necessary for 
high-redshift studies increases the overall number den- 
sity of objects, thereby increasing the problems of fore- 
ground and background contamination and projection ef- 
fects (though photometric techniques for estimating red- 
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shifts can mitigate these difficulties) . The SZ effect is 
very promising, since it is entirely independent of red- 
shift, but it also suffers from confusion limits and projec- 
tion effects, and in any case a large survey of SZ clusters 
is yet to be undertaken. For the time being, then, one of 
the few methods that can be applied to large numbers of 
groups and clusters on similar mass scales at z ~ and 
z ~ 1 is the direct detection of these structures in the 
redshift-space distribution of galaxies. 

The first sizeable sample of groups detected inred- 
shift space was presented by iGeller fc Huchral (1982), 
who found 176 groups of three or more galaxies in the 
CfA g alaxy redshift sur vey at redshifts z < 0.03. Re- 
cently, lEke et aTl l)2004f) identified groups within the fi- 
nal data release of the Two-degree Field Galaxy Redshift 
Survey (2dFGRS). Their catalog extends to z w 0.25 
and constitutes the largest currently available catalog of 
galaxy groups, containing ~ 3 x 10 4 groups with two 
or more members. A comprehensive listing of previous 
studies of local optically selected group samples is also 
given by these authors. Work is currently underway to 
detect groups of galaxies in t he spectrosc opic data of 
the Sloan Digital Sky Survey l|Nicholl200|l - Studies of 
groups detected in re dshift space wer e exten ded to inter- 
mediate redshifts by iCarlberg et alJ 1)200 If) , who found 
more than 200 groups in the CNOC2 redshift survey, 
with redshifts in th e range 0.1 < z < 0.55. Additionally, 
iCohen et alJ (J2000) studied a sample of 23 density peaks 
in redshift space, over the redshift range < z < 1.25. 
Until now, however, no spectroscopic sample has existed 
with sufficient size, sampling density and redshift accu- 
racy to extend redshift-space studies of large numbers of 
groups to redshifts z > 0.5. 

Th e DEEP2 Galaxy R edshift survey i|Davis et all 
120031: iFaber et alJ lin prep J) is the first large spectro- 
scopic galaxy catalog at high redshift, with observations 
planned for ~ 5 x 10 4 galaxies, most of which will fall 
in the range 0.7 < z < 1.4. The survey is thus a unique 
dataset for studying high-redshift galaxy groups. With 
such a broad range in redshift, we expect to observe evo- 
lution in the properties of galaxies, groups, and clusters 
within the DEEP2 sample itself; also, by comparing to 
local samples from 2dFGRS and the Sloan Digital Sky 
Survey (SDSS), we expect to observe evolution between 
z ~ 1 and the present epoch. DEEP2 is especially well- 
suited to studies of groups, since its high redshift accu- 
racy allows detailed studies of their internal kinematics. 
Repeated observations of some DEEP 2 galaxies indicate 
a velocity accuracy Sv ~ 25 km s _1 , consi derably better 
than the 2dFGRS value Sv « 85 km s" 1 l|Colless et alJ 
l2001[) and similar to the 5v ~ 30 km s _1 attained in the 
SDSS l|Stoughton et alJl2002|) . In particular, it will be 
possible to estimate the masses of DEEP2 groups from 
their velocity dispersions. We anticipate that, by mea- 
suring the evolution of the group velocity function with 
redshift, it will be possible to constrain cosmological 
parameters such as the dark energy density parameter 
I^a and equation of state parameter w, as outlined in 
iNewman et al.l i|2002j) . Before carrying out such studies, 
however, it will be essential to develop robust methods 
for detecting groups and clusters within DEEP2. 

Identifying groups and clusters in redshift space is 
well known to be a difficult task. Most notably, clus- 
tering information is smeared out by redshift space dis- 



tortions like the so-called fingers-of-God effect, in which 
galaxies in groups and clusters appear highly elongated 
along the line of sight because of intracluster peculiar 
motions. This intermingles group members with other 
nearby galaxies and causes neighboring groups to over- 
lap in redshift space. A group-finding algorithm that 
attempts to find all group members will thus neces- 
sarily be contaminated by interloper field galaxies and 
will necessarily merge some distinct groups together into 
spurious larger structures. Conversely, a group finder 
that aims to minimize contamination and over-merging 
will fragment some larger clusters into smaller groups. 
This trade-off in group-find ing errors is well known (e.g., 
Nolth enius fc White! 11987]) and fundamentally unavoid- 
able. It will therefore be essential, before we begin any 
program of group finding, to construct a suitable def- 
inition of group-finding success, identifying in advance 
which errors we seek to minimize and what sort of er- 
rors we are willing to tolerate. Ultimately, the chosen 
definition of success will depend on the intended scien- 
tific purpose of the group catalog. A major portion of 
this paper will be devoted to defining appropriate mea- 
sures of group-finding success for the DEEP 2 survey and 
optimizing our methods using these criteria. 

This paper is organized as follows. In Section [21 we 
introduce the DEEP2 sample and discuss the unique op- 
portunities and difficulties it presents for group finding. 
In the same section we describe the DEEP2 mock galaxy 
catalogs, which we will use to calibrate our group-finding 
methods. Then, in Section^ we describe our criteria for 
group-finding success. In Section 0] we give an overview 
of various group-finding methods that have been used 
in the literature, a nd we describe the Vo ronoi-Delaunay 
Method (VDM) of iMarinoni etTaO ((2002) , which we will 
use in this study. We then proceed to optimize this 
method for the DEEP2 sample. Finally, in Section 
we apply the VDM algorithm to the current DEEP 2 ob- 
servations and present the first DEEP2 group catalog. 

2. GROUP FINDING IN THE DEEP2 SURVEY 
2.1. The DEEP2 sample 

As mentioned, the DEEP2 Galaxy Redshift Survey is 
the first large (tens of thousands of galaxies) spectro- 
scopic survey of galaxies at high redshifts, z ~ 1. The 
goal of the survey is to obtain spectra of ~ 5 x 10 4 ob- 
jects over 3.5 deg 2 on the sky to a limiting magnitude 
of Rab = 24.1 using the DEIMOS spectrograph on the 
Keck II telescope. Typical redshifts in the survey fall 
in the range 0.7 < z < 1.4. Details of the survey will 
be described compreh ensively in an upcoming paper by 
IFaber et al.l l)in prep J) : we summarize the salient infor- 
mation for this study here. 

The survey consists of four fields on the sky, chosen to 
lie in zones of low Galactic dust extinction. Three-band 
(BRI) photometry has been obtained for each of these 
fields using the CFH12K camera on the Cana da- France- 
Hawaii Telescope (CFHT), as described by iCoil et all 
( 2004bJ). In three of the fields, which each consist of three 
contiguous CFHT pointings covering a strip of 120 ar- 
cmin by 30 arcmin, galaxies are selected for spectroscopy 
if they pass a simple cut in color-color space. This cut 
reduces the fraction of galaxies at redshifts z < 0.7 to be- 
low 10%, while eliminating only ~ 3% of higher-redshift 
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TABLE 1 

Locations and observational status of 
the deep2 pointings considered in this 

PAPER. 



Pointing 






Redshift 


name a 


RA b 


dec b 


success c 


22 


16 51 30 


+34 55 02 


0.72 


32 


23 33 03 


+00 08 00 


0.71 


42 


02 30 00 


+00 35 00 


0.70 



a The pointings are named according to a 
convention in which, for example, pointing 
32 refers to the second CFHT photometric 
pointing in the third DEEP2 field. 

b Positions of the pointing centers sky are 
given in J2000 sexagesimal coordinates. 

Traction of spectroscopic targets for 
which a definite redshift could be measured. 



galaxies ijFaber et alJIin prep J) . A fourth field, the ex- 
tended Groth Survey strip, covers 120 arcmin xl5 ar- 
cmin, and because of a wide variety of complementary 
observations underway there, galaxies in this field are 
targeted for spectroscopy regardless of color. For the 
sake of uniformity we neglect the Groth field in the cur- 
rent study, though it will be quite useful in future work. 
Within each CFHT pointing, galaxies are selected for 
spectroscopic observation if they can be placed on one 
of the ~ 40 DEIMOS slitmasks covering that pointing. 
Within each pointing, slitmasks are tiled in an overlap- 
ping pattern, using an adaptive tiling scheme to increase 
the sampling rate in regions of high density on the sky, 
so that the vast majority of galaxies have two opportu- 
nities to be selected for spectroscop y. Further details o f 
the observing scheme can be found in lDavis et alJ |2004). 
Overall, roughly 60% of galaxies that meet our selection 
criteria are targeted for DEIMOS observation. 

Spectroscopic data from DEIMOS ar e reduced us- 
ing an au tomated data-reduction pipeline ijNewman et all 
lin prep.(l . and redshift identifications are confirmed vi- 
sually. In this paper, we focus on galaxies in the three 
CFHT pointings in which all spectroscopy has been com- 
pleted as of this writing. The locations of these point- 
ings are given in table ^ Each pointing has a width 
of 48 arcmin in right ascension and 28 arcmin in dec- 
lination. These fields have each been fully covered, or 
very nearly so, by DEIMOS spectroscopy, with a red- 
shift success rate greater than 60% for each slitmask and 
an overall redshift success rate of ~ 70%. These three 
fields, taken together, comprise a sample of 8785 galaxies 
with confirmed redshifts (8370 with 0.7 < z < 1.4), with 
a median redshift of z = 0.912. This sample represents 
the largest sample ever used for group-finding in redshift 
surveys of distant (z > 0.25) galaxies, b eing more than 
twice as large as the CNOC2 sample of iCarlberg et al.l 
(2001), and the first such sample at z ~ 1. 

2.2. The DEEP 2 mock catalogs 

Both to assess the impact of selection effects and to 
test and calibrate our group-finding, it will be neces- 
sary to study the properties of groups in realistic mock 
galaxy catalogs. For t his purpose we w ill use the mock 
catalogs developed bv lYan et alJ l)2003|) . These catalogs 



are produced by assigning "galaxies" to N-body sim- 
ulations according to the prescriptions of the popular 
"halo mode l " for large-scale structu re formation (e.g., 
iSeliakl l200(j iPeacock fc Smith! 12000(1 . This model as- 
sumes that all galaxies form within virialized dark mat- 
ter halos. The mean number of galaxies above some 
luminosity L cut in a halo of mass M is then given by 
the Halo Occupation Distributio n N(M) l|Berlind et alJ 
120031 iMarinoni &: Hudson 112002(1 . while the luminosities 
of galaxies in the halo obey a Con ditional Luminosity 
Function, $(L|M) l|Yang et all2003(l . which is allowed to 
evolve with redshift in keeping with observations. These 
functions can be varied to produce mock galaxy cata- 
logs that match the observed DEEP2 redshi ft distribu- 
tion an d clustering statistics, as measured bv lCoil et alJ 
l(2004a(l . In the DEEP2 mock catalogs used here, the 
"galaxies" populating a given host halo are assigned po- 
sitions and velocities as follows: the brightest galaxy in a 
halo is placed at the halo's center of mass, and all other 
galaxies are assigned to random dark matter particles 
within the halo. 

For the purposes of this work, we will use the most 
recent version of the mock catalogs p r oduce d using sim- 
ulation 4 from Table 1 of lYan et all (|2003T> : for further 
details about the creation of the DEEP2 mock galaxy 
catalogs, the reader is referred to that paper. Here 
we merely note in summary that the catalogs comprise 
twelve nearly independent mock DEEP2 fields with the 
same geometry as the three high-redshift DEEP2 fields, 
extending over a redshift range 0.6 < z < 1.6. They 
have been constructed by populating N-body simulations 
computed in a flat ACDM cosmology with density pa- 
rameter rtm—0.3, fluctuation amplitude 08=0.9, spec- 
tral index n=0.95, and dimensionless Hubble parameter 
h = i^o/100 km s _1 = 0.7. The evolution of large-scale 
structure with redshift is included in the mocks by stack- 
ing different time slices from the N-body simulations 
along the line of sight. The simulations resolve dark mat- 
ter halos down to masses around 8 x 10 10 MghT 1 , suffi- 
ciently low to encompass all galaxies above L cut = 0.1L*. 
This luminosity cut, in turn, is sufficently low to be be- 
low the DEEP2 magnitude cut for the redshift range of 
interest here, z > 0.7. We have tested our group-finding 
methods on mock catalogs created using different halo 
model parameters, and we find that the results presented 
in this paper are acceptably robust to such changes (i.e., 
their effects on the reconstructed group catalog are gen- 
erally smaller than the cosmic variance). 

In order to study the impact of galaxy selection effects 
on our group sample, we produce four distinct subsam- 
ples from the mock catalogs. The volume-limited sam- 
ple contains all galaxies down to a limiting magnitude 
Lmin = 0.1L* (it is important to note that this catalog 
is not "volume-limited" in the traditional sense, since 

— and hence L m i n — varies with redshift in the mock 
catalogs). The magnitude-limited sample has had the 
DEEP2 magnitude limit of Rab < 24.1 applied, cut- 
ting out the faint galaxies in the volume-limited sample 
in a distance-dependent way (the mock catalogs do not 
contain color information, so no color cut is applied; we 
simply take the DEEP2 color criteria to be equivalent 
to the redshift limit z > 0.7). The masked sample is 
the result of applying the DEEP 2 "mask-mak i ng" algo - 
rithm (see lDavis et all l|2004f) and lFaber et alJ (|in prepl) 
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for details), which schedules galaxies for slitmask spec- 
troscopy, to the magnitude-limited sample. Because the 
amount of space on DEIMOS slitmasks is finite, and be- 
cause neighboring slits' spectra may not overlap, only 
~ 60% of suitable target galaxies can be scheduled for 
observation. 

Finally, the mock DEEP2 sample simulates the effects 
of redshift failures within the observed DEEP2 sample. 
Currently, approximately 30% of observed DEEP2 galax- 
ies cannot be assigned a firm redshift, in large part be- 
cause of the presence of galaxies at z > 1.5, for which 
no strong spectral features fall in the DEEP2 wavelength 
range, but also because of poor observing conditions, low 
signal-to-noise ratio, or instrumental effects. The red- 
shift success rate also has some magnitude dependence 
for faint galaxies, dropping by ~ 15% between R = 22.6 
and R = 24.1. These effects are fully taken into ac- 
count in the mock DEEP2 sample. Since this sample is 
the most similar to the actual DEEP2 redshift catalog, 
we will use it to test and calibrate our group-finding al- 
gorithm; we will use the other three samples to study 
various selection effects. 

2.3. Difficulties for group finding in deep redshift 
surveys 

Identifying an unbiased sample of groups and clus- 
ters of galaxies in redshift space is notoriously difficult. 
As mentioned in Section ^ the most obvious and well- 
known complication is rcdshift-space distortions: the or- 
bital motions of galaxies in virialized groups cause the 
observed group members to appear spread out along the 
line of sight (the fingers-of-God effect), while coherent in- 
fall of outside galaxies into existing groups and clusters 
reduces their separation from group centers in the red- 
shift direction (the Kaiser effect). Both of these effects 
confuse group membership by intermingling group mem- 
bers with other nearby galaxies. Since it is impossible 
to separate the peculiar velocity field from the Hubble 
flow without an absolute distance measure, this confu- 
sion can never be fully overcome, and it will be a sig- 
nificant source of error in any group-finding program in 
redshift space. A second complication arises from in- 
complete sampling of the galaxy population. No mod- 
ern galaxy redshift survey can succeed in measuring a 
redshift for every target galaxy, and it has been shown 
ijSzapudi fc Szalavl l996) that an incomplete galaxy sam- 
pling rate always leads to errors in the reconstructed cat- 
alog of groups and clusters — even without redshift-space 
distortions. 

In addition, surveys conducted at high redshift and 
over a broad redshift range present their own impedi- 
ments to group finding. The first is simple: distant galax- 
ies appear fainter than nearby galaxies. For example, 
the DEEP2 Rab = 24.1 magnitude limit means that the 
fain test DEEP2 galaxies at z ~ 1 have luminosities near 
L* l)Willmer. et al. l|in prepl) . We are thus probing only 
relatively rare, luminous galaxies, so only a small fraction 
of a given group's members will meet our selection cri- 
teria. Moreover, galaxies selected with the same criteria 
will correspond to different samples at different redshifts. 
Selection in the R band (as is done for DEEP2) corre- 
sponds to a rest-frame B band selection at z — 0.7 and 
a rest- frame U band selection at z > 1.1, meaning that 
red, early-type galaxies will drop below the limiting mag- 
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Fig. 1. — Rates of spectroscopic observation and redshift suc- 
cess as a function of local density of DEEP2 target galaxies on the 
sky, as measured by the distance D3 of a galaxy from its third- 
nearest neighbor. The dashed line shows the probability that a 
galaxy meeting the DEEP2 targeting criteria is scheduled for spec- 
troscopic observation, as a function of D3. The solid line shows 
the probability that an observed galaxy yields a successful redshift, 
multiplied by the dashed line, to give the total probability that a 
potential DEEP2 target has its redshift measured, as a function of 
D3. The top axis shows the percentage of DEEP2 galaxies that 
have D3 less than the indicated value, and the dotted line shows 
the scale of a typical cluster core (300 kpc) at z = 1. Clearly, the 
probability of observation is reduced in regions of high local den- 
sity, although local density appears to have little further effect on 
redshift success. The sharp increase in the ratios at very low D3 
arises because extremely close pairs of galaxies may be observed 
together on a single slit. 



nitude at lower redshifts than blue, star-forming galax- 
ies. Since blue galaxies are observed t o be less strongly 
clustered than red galaxies in DEEP2 (iCoil et al.l F2004a1 
and locally (e.g., iMadgwick et al. Il2003aj) . we expect 
that the density contrast between group members and 
isolated galaxies will be weaker for the DEEP2 sample 
than it would be for a sample selected in rest-frame /, for 
example. Finally, the very evolution of large-scale struc- 
ture with redshift that one wishes to probe will pose a 
problem, since the mass function of dark matter halos 
will be shifted to lower masses at high redshift, leading 
to smaller groups and clusters. 

A further, more complicated problem is posed by the 
realities of multi-object spectroscopy. Because of the 
physical limitations of slitmask or fiber-optic spectro- 
graphs, it is difficult to observe all galaxies in densely 
clustered regions. In DEEP2, for example, the minimum 
DEIMOS slit length is three arcseconds (approximately 
20 kpc at z ~ 1); objects closer than this on the sky 
cannot be observed on the same slitmask (except in the 
special case of very close and appropriately aligned neigh- 
bors, which can both be observed on a single slit). This 
problem is mitigated somewhat by the adaptive scheme 
for tiling the DEEP2 CFHT imaging with slitmasks, 
which gives nearly every target at least two chances to 
be observed; nevertheless, slit collisions cause us to be 
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Fig. 2. — Distribution of group redshifts in a mock DEEP2 
field (120 X 30 arcmin). Left: The upper panel shows rcdshift dis- 
tributions for all groups that enter a given catalog with richness 
N > 2. The solid line shows the distribution for the volume-limited 
catalog, the dotted line shows the distribution for the magnitude- 
limited catalog, and the das hed l ine shows the distribution for the 
masked catalog (see Section 12.21 for the definitions of these mock 
samples). The apparent decrease in group abundances at low red- 
shifts arises is intended to mimic the DEEP2 photometric selection 
criteria. The middle panel shows the distributions for groups en- 
tering each catalog with richness TV > 2, and the bottom panel 
shows the distributions for groups with N > 4. Right: Redshift 
distributions for the same three catalogs, for groups with veloc- 
ity dispersion above some threshold <r c . From top to bottom, the 
panels represent cr c = 200, 400 and 600 km s . Note that, when 
groups are selected by velocity dispersion, the discrepancy between 
the three catalogs decreases as cr c increases, whereas the discrep- 
ancy increases with richness. 



biased against observing objects that are strongly clus- 
tered on the sky. Moreover, the quality of DEIMOS spec- 
tra is degraded somewhat for short slit lengths, due to 
the difficulty of subtracting night-sky emission for such 
slits, so we might expect a lower redshift success rate for 
clustered objects. 

Figure shows the probabilities of observation and 
redshift success as functions of the distance to an ob- 
ject's third-nearest neighbor on the sky. (We have chosen 
the third-nearest neighbor distance because this is a less 
noisy measure of local density than the simple nearest- 
neighbor distance.) Clearly we are less likely to observe 
galaxies in dense regions on the sky, though this effect is 
relatively weak, and local density appears to have little 
effect on the redshift success rate. Moreover, as shown 
in the figure, the vast majority of DEEP2 targets have 
neighbors on the sky at distance scales smaller than a 
typical cluster core radius (~ 300 kpc). Since we ex- 
pect a much smaller percentage of galaxies to actually 
reside in cluster cores, we conclude that a given galaxy's 
close neighbors are frequently in the foreground or back- 
ground. Hence, although we clearly undersample galax- 
ies in dense regions on the sky, we are not necessarily 
undersampling galaxies in dense regions in three-space. 
Nevertheless, all of the effects discussed in this section, 
taken together, mean that nearly all DEEP2 groups will 
have fewer than ten members (see Table 

The galaxies in each group will thus represent a very 



sparse, discrete sampling of the membership of each 
group. It is well known that large errors can result 
when the moments of a distribution are estimated from 
a sparse sample. In particular, computing velocity dis- 
persions with the usual formula for standard deviation, 
will be an unreliable method for such 



small groups. iBeers et"al"l l|1990|) have studied this issue 
in the context of galaxy clusters. They assess a num- 
ber of alternative dispersion estimators and determine 
the most accurate ones for different ranges in group rich- 
ness. For the richness range of interest here, N ~ 5, they 
find the most robust method to be the so called "gapper" 
estimator, which measures velocity dispersion using the 
velocity gaps in a sample according to the formula 
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where the line-of-sight velocities m have been sorted into 
ascending order. Since we expect this estimator to be 
more accurate than the standard deviation for our pur- 
poses, we will measure velocity dispersions as a = o~q 
throughout this paper. Furthermore, in this paper we 
shall always compute velocity dispersions using the galax- 
ies in a given sample. Correcting these values to reflect 
the velocity dispersions of dark matter halos will ulti- 
mately be necessary for comparison with predictions, but 
we focus here on measureable quantities and defer this 
(theoretical) issue to future work. 

The effects of the DEEP2 target selection criteria can 
be seen in Figure [5] The upper left-hand panel shows the 
redshift distribution of groups in a single mock DEEP2 
pointing, drawn at random from the mock catalogs, for 
the volume-limited, magnitude- limited, and masked sam- 
ples. Here a group is defined to be the set of all galax- 
ies in a given sample that occupy a common dark mat- 
ter halo, and a group's redshift is given by the median 
redshift of its member galaxies. The remaining panels 
show subsets of these three group catalogs, containing 
groups above a given threshold in richness N or line- 
of-sight velocity dispersion a. It is worth noting briefly 
that, in some redshift bins, the masked sample has more 
groups than the magnitude-limited sample from which 
it is drawn. This effect is easy to understand: it occurs 
when group members are discarded, moving the median 
redshifts of some groups from one bin to another. The 
important point, however, is that when the a threshold 
is increased, the discrepancies become smaller between 
the volume-limited, magnitude-limited and masked sam- 
ples. On the other hand, these discrepancies increase 
when the richness threshold is increased: we note in par- 
ticular the sharp drop-off in groups with N > 4 between 
the magnitude-limited and masked sample. Evidently 
groups selected according to observed richness constitute 
a significantly biased sample, whereas groups selected by 
observed velocity dispersion can provide a more accurate 
representation of the full underlying sample. 

This result is not surprising. Velocity dispersion is 
known to scale with ha lo mass roughly as a cx M 1 / 3 
ijBrvan &: Normanlll998j) . and richness should also scale 
with M. In a magnitude- limited sample, measured group 
richnesses will be affected by the flux limit so that more 
distant groups will have fewer observed members: for 
example, a group observed to have three members at 
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Fig. 3. — Effects of DEEP2 selection on group velocity disper- 
sion (j. The upper panel shows the n(a) distribution for groups in 
each of the volume-limited, magnitude-limited and masked sam- 
ples in a single mock DEEP2 field. Note that, as suggested by 
figure [2] the three distributions are similar at high velocity dis- 
persions. The lower panel shows how individual groups' velocity 
dispersions change when the DEEP2 slitmask-making algorithm is 
applied. Crosses show the dispersions of individual groups com- 
puted from the galaxies present before and after mask-making; 
open squares and error bars show the mean and standard deviation 
of the masked a value in bins of 100 km s — 1 in magnitude-limited 
a value. The dashed line is the line of equality for the pre- and 
post-maskmaking velocity dispersions. A majority (57%) of the 
groups plotted fall exactly on this line. 



z = 1.3 will actually contain significantly more galax- 
ies than a three- member group observed at z = 0.7. 
However, sufficiently massive (i.e., high-dispersion and 
-richness) groups are nearly certain to enter the observed 
catalog with more than one member at all redshifts — and 
hence to be identifiable as groups. Selection effects that 
reduce the number of galaxies observed in each group 
will introduce a scatter in the measured velocity disper- 
sions of individual groups. But above some appropriate 
critical dispersion, a c we expect the observed distribution 
of group velocity dispersions, n(a), to resemble the true 
one. In Figure 3, we see that this expectation is borne 
out in the DEEP2 mock catalogs. Although a significant 
scatter exists in measured group velocity dispersions, the 
agreement between the n(a) distributions for the three 
mock catalogs improves with increasing a. For these rea- 
sons, we expect that it will be possible to identify a ro- 
bust sample of DEEP2 groups whose n(a) distribution 
is not strongly biased by observational effects. 

3. DEFINING THE OPTIMAL GROUP CATALOG 

Because we expect any group-finding algorithm to be 
prone to many different types of error, it is crucial that 
we define carefully our tolerance for various errors and 
craft a specific definition of group-finding "success." To 



begin with, we must establish what we mean when we 
spe ak of a galaxy group. As already noted briefly in Sec- 
tion in the spirit of the halo model, we define galaxy 
groups in terms of dark matter halos. We define a parent 
halo to be a single, virialized halo that contributes one or 
more galaxies to our sample; the contributed galaxies we 
call the halo's daughter galaxies. A group is then defined 
to be a set of (two or more) galaxies that comprises the 
daughter galaxies of a single parent halo. Field galaxies 
are those galaxies that constitute the lone daughters of 
their respective parent halos. These definitions are con- 
venient because cosmological tests based on cluster abun- 
dance are in reality concerned with the abundance of viri- 
alized dark matter halos; we wish to infer the presence of 
such objects from the clustering of galaxies. In applying 
this definition we consider to be separate groups those 
halos that are not virialized with respect to each other in 
a common potential well, but we make no distinction be- 
tween subhalos within a larger, common virialized halo. 
It will also be necessary in what follows to differentiate 
between real groups — those sets of galaxies that actually 
share the same underlying dark matter halo — and recon- 
structed groups — the sets of galaxies identified as groups 
by the group finder. 

The ideal reconstructed group catalog would be one 
in which (i) all galaxies that belong to real groups are 
identified as group members, (ii) no field galaxies are 
misidentified as group members, (Hi) all reconstructed 
groups are associated with real, virialized dark-matter 
halos, (iv) all real groups are identified as distinct ob- 
jects, and (v) these objects contain all of their daughter 
galaxies and no others. As discussed in Section l2~3l how- 
ever, such a catalog is impossible to achieve because of 
rcdshift-space distortions and incomplete sampling of the 
galaxy population. Nevertheless, this ideal will be useful 
as a means of assessing the veracity of our group cat- 
alog. It is thus important to define a vocabulary with 
which to compare our group catalog to the ideal one. 
We shall make frequent use of the following definitions: 
a group catalog's galaxy-success rate So a i is the fraction 
of galaxies belonging to real groups that are identified as 
members of reconstructed groups. Interlopers are field 
galaxies that are misidentified as group members in the 
reconstructed catalog, and the interloper fraction fj of a 
group catalog is the fraction of reconstructed group mem- 
bers that are interlopers. The completeness C of a group 
catalog is the fraction of real groups that are successfully 
identified in the reconstructed catalog (we shall define 
what it means to be "successfully identified" shortly); 
conversely, the purity P is the fraction of reconstructed 
groups that correspond to real groups. Fragmentation 
occurs when a real group is identified as several smaller 
groups in the reconstructed catalog, and over-merging 
occurs when two or more real groups are identified as a 
single reconstructed object. 

Since a perfect group catalog is impossible to achieve, 
we shall focus our efforts on reproducing certain selected 
group properties as accurately as possible. There are 
many properties we could choose to reproduce for differ- 
ent scientific purposes; each choice has advantages and 
drawbacks. We could, for example, choose to maximize 
Seal, thus ensuring that our group catalog contains all 
galaxies that belong to real groups. Such a sample would 
likely have a high interloper fraction and much over- 
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H: 4 members 
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2-way success + fragmentation Over-merging 

Fig. 4. — Schematic depiction of various success and failure 
modes for group-finding under the criteria discussed in the text. 
Diagrams show hypothetical comparisons between real groups (un- 
primed) and the groups found by a group finder (primed). (A) A 
fully two-way successful reconstruction, in which both the real and 
found group have Cq > 0.5. (B) A one-way success, in which the 
real group has Cq > 0.5, but half of the found group is made up of 
interlopers. (C) Both a two-way success (G and G') and a failure 
due to fragmentation (G"). (D) An example of overmerging, in 
which G and H are both one-way successful real groups, but G' 
combines their members into a single group. 



merging, however (for example, the easiest way to en- 
sure Seal — 1 would be simply to place all galaxies in 
the sample into a single group) . Conversely, a group cat- 
alog that minimizes the interloper fraction would likely 
be highly incomplete, sucessfully finding only the cores 
of the largest groups. Such catalogs might be useful for 
studies of the properties of galaxies in groups, but they 
are unlikely to be of much use for studying cosmology or 
large-scale structure. 

A different approach is to gauge success on a group- 
by-group basis and attempt to maximize completeness, 
purity, or both. To do this, we must develop a quantita- 
tive measure of our success at reconstructing individual 
groups; we will use the concept of the Largest Group 
Fraction (LGF) (c/.|M 

arinoni et al.ll2002l and references 
therein). To compute the LGF for a given real group 
G, we first find the reconstructed group G that con- 
tains a plurality of the galaxies in G (the fact that this 
is not necessarily unique does not concern us, since we 
will eventually require a majority for a successful recon- 
struction). The group G' we call the Largest Associated 
Group (LAG) of G. The LGF L G of group G is then 
defined as 

N(G n G') 



C G = 



N(G) 



where the notation N(A) denotes the number of galaxies 
in the set A. That is, the LGF is the fraction of group 
G that is contained in its LAG G' . The LGF of a re- 
constructed group is defined similarly, but with G being 
drawn from the reconstructed catalog and its LAG G' 
being drawn from the real catalog. It should be men- 
tioned here that Cg can only be measured for groups 
in mock catalogs, where we know the real group mem- 
berships. In all further discussion of tests involving the 
LGF, it should be assumed that these tests take place in 
mock catalogs. 

The LGF statistic allows us to define an unambiguous 
set of group-finding success measures. In essence, we 



declare a successful detection if a group's Cg is greater 
than some fraction /. For this definition to be unique, 
we must have / > 0.5, with higher values of / implying a 
more stringent definition of success. For the remainder of 
this work, we will set / to the minimal value of 0.5, since, 
as we shall see, this definition of success is already quite 
strict. However, simply requiring a group to have Cg > 
f is insufficient: a real group could meet this criterion 
but still been merged into a larger object by the group- 
finder, and a reconstructed group G' could have Cq' > 
f if it is a fragment of a larger real group. For this 
reason, it will be important to differentiate between one- 
way matches, in which a group simply has LGF above 
/, and two-way matches, in which a group G and its 
LAG G' satisfy Cg,Cg> > f, an d G is also the LAG 
of G' (See Figure El for a schematic depiction of these 
success measures). Hence we shall differentiate between 
one-way purity Pi , the fraction of reconstructed groups 
with Cq > /, and two-way purity P 2 , the fraction of 
reconstructed groups which are two-way matches with 
some real group. Similarly, for real groups, we define 
one-way completeness C\ and two-way completeness C 2 - 
Comparing these statistics can give some indication of 
systematic errors in the group catalog. A real group 
that is a one-way success but not a two-way success has 
likely been overmerged by the group finder; therefore if 
C\ is much larger than C2 we expect that our catalog has 
been highly over-merged. Similarly, if P\ is significantly 
greater than P 2 , we expect that our catalog is highly 
fragmented. 

Our definition of success has another potential prob- 
lem, however: it requires, minimally, only that we re- 
construct half of each group. Thus, a "successful" search 
strategy could seek only the most tightly clustered sets of 
galaxies and detect only the cores of groups and clusters. 
Such a group catalog would likely be of high purity, with 
few interlopers; it could be useful for identifying groups 
for follow-up observation in X-ray or Sunyaev-Zeldovich 
surveys. But it would likely have low completeness, and 
it probably would not accurately reproduce group prop- 
erties like richness, physical size, or velocity dispersion, 
making estimates of cluster mass impossible with spec- 
troscopic data alone. 

In order to mitigate such difficulties, we must also 
monitor our success in reproducing group properties. In 
part, this means we should attempt to accurately mea- 
sure properties like the velocity dispersion of successfully 
reconstructed groups on a group-by-group basis. How- 
ever, since errors in individual group detections are in- 
evitable with any group finder, we must also determine 
whether these errors bias the overall distribution of group 
properties in our catalog. Ultimately it is these statisti- 
cal distributions we will want to reproduce as accurately 
as possible. For example, if we wish to study the abun- 
dance of groups as a function of redshift, n(z), we must 
take care to ensure that spurious group detections and 
undetected real groups do not skew this distribution. 

Clearly, then, there are many different possible means 
by which we could gauge our success at group finding. 
As we have said, our chosen measure of success will de- 
pend strongly on the ultimate scientific purposes of our 
group catalog. In our case, among other uses, we en- 
vision using the DEE P2 group catalog to co nstrain cos- 
mological parameters. iNewman et alJ (f2002f l have shown 
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that DEEP2 groups can be used for this purpose if their 
abundance is measured accurately as a bivariate distri- 
bution in velocit y dispersion and redsh ift, n{o~, z). It has 
also been shown l)Marinoni et all2002[) that the Voronoi- 
Delaunay group-finding algorithm can successfully recon- 
struct this distribution (down to some limiting velocity 
dispersion er c ); hence we will seek in this study to max- 
imize the accuracy of our reconstructed n(a, z) above 
some o~ c . Of course, it would be possible in principle to 
reproduce this distribution by chance with a low-purity, 
low completeness catalog. Therefore, we will simultane- 
ously strive to maximize the completeness and purity 
parameters, while also taking care to keep C\ k> C2 
and Pi ss P2 to guard against fragmentation and over- 
merging. Indeed, it is always important to monitor these 
statistics in order to ensure that our reconstructed group 
catalog corresponds reasonably well with reality. We do 
not actively monitor the Seal or // parameters when 
optimizing our group finder, but we anticipate that a 
catalog that meets our success criteria will also be of 
reasonably high quality by these measures as well (this 
is borne out in Section l4~5j) . 

In concluding this section, it is important to note that 
when we speak of group velocity dispersions or redshifts 
in this paper, we are talking only about the properties as 
computed from the observed group members. Although 
we will ultimately be interested in the properties of dark 
matter halos (which can be predicted theoretically and 
used to constrain cosmology), these cannot be measured 
directly, even in principle. Even with a completely error- 
free group catalog, a theoretical correction would have 
to be applied to account for the effects of discreteness. 
Thus we will be interested in reconstructing n{a, z) as 
computed using observed galaxies only. We make no at- 
tempt to reconstruct the distribution as computed for the 
dark matter or using unobserved galaxies, and we leave 
computation of theoretical correction factors to future 
work. 

4. CHOOSING AND OPTIMIZING THE GROUP-FINDER 

Several different techniques have been developed to 
find groups in spectroscopic redshift samples. We review 
them briefly here, as a means of introducing the main 
issues that will concern us in selecting a group-finding 
algorithm. 

4.1. A brief history of group finding 

IHuchra fc Gellerl ljl982j) presented a simple early 
method for identifying groups and clusters in the Cen- 
ter for Astrophysics (CfA) redshift survey by looking for 
nearby neighbor galaxies around each galaxy. Commonly 
known as the friends of friends or percolation method, 
this technique, in its simplest form, defines a linking 
length b and links every galaxy to those neighboring 
galaxies a distance b or less away ("friends"). This pro- 
cedure produces complexes of galaxies linked together 
via their neigbors ("friends of friends"); these complexes 
are identified as groups and clusters. Versions of this 
algorithm have been widely used to ide ntify groups in lo- 
cal redshift surveys — most recently by Eke 
in 2dFGRS — and percolation techniques have also long 
been used to identify virialized dark-matter halos within 
N-body simulations. The percolation algorithm is intu- 
itively attractive because it identifies those regions with 



an overdensity 6 > (27r6 3 /3) _1 compared to the back- 
ground density. The overdensity 6 V of virialized objects 
can be readily computed using the well-known spherical 
collapse model, yielding an appropriate linking length 
of b = 0.2 (i^) -1 / 3 for identifyi ng virialized ob jects in 
an Einstcin-de Sitter universe Ipavis et al.lll985j) . where 
(v) is the mean spatial number density of galaxies (this 
linking length is somewhat smaller for a ACDM model, 
a point which has frequently been ignored in the liter- 
ature). Hence, the percolation algorithm is a natural 
method for identifying virialized structures in the ab- 
sence of redshift-space distortions. 

Unfortunately, working in redshift space can cause se- 
rious problems for this algorithm. The fingcrs-of-God 
effect requires that we stretch the linking volume into 
an ellipsoid or cylinder along the line of sight, which 
increases the possibility of spurious links. Because the 
percolation method considers each galaxy equally while 
creating links, then places all linked galaxies into a given 
group or cluster, such false links can lead to catas- 
trophic failures, in which the group finder "hops" be- 
tween several nearby groups, merging them together into 
a single, falsely detected massive cluster. On the other 
hand, shrinking the linking volume to avoid this prob- 
lem increases the chances that a given structure will be 
fragmented into several smaller structures by the group 
finder or misse d entirely. These prob l ems h ave been stud- 
ied in det ail bvlNolthenius fc White! l|1987|) and more re- 
cently bv lFrederid l)1995l) . 

To combat such difficulties, va rious other g r oup-fi nding 
methods have been developed. iTullvl l|1980l I1987|) used 
the so-called "hierarch ical" group-find ing scheme, orig- 
inally introduced by iMaternel l)1978ft . to find nearby 
groups. The hierarchical grouping procedure used 
is computationally interesting, but in the context of 
the current model of structure formation it seems to 
lack theoretical motivation. More recently, the SDSS 
tea m has introdu ced a group-finding algorithm called 
C4 ijNicholl |2004|) , which searches for clustered galax- 
ies in a seven-dimensional space, including the usual 
three redshift-space dimensions and four photometric 
colors, on the principle that galaxy clusters should con- 
tain ajDOjDutatipn of gal axies with similar observed col- 
ors. iKepner et alJ l)1999|) introduced a three-dimensional 
"adaptive matched filter" algorithm which identifies clus- 
ters by adding "halos" to a synthesized background mass 
density and computing the maxim um-likelihood mass 
density. iWhite fc Kochanekl lj2002f) found that this al- 
gorithm is extremely successful at identifyin g clusters in 
spectr oscopic redshift surveys, and recently. lYang et alJ 
(2004) have introduced a group-finder that combines 
elements of the matched filter a n d per colation algo- 
rithms. Finally, M arinoni et alJ l)2002f) developed a 
group-finding algorithm — the Voronoi-Delaunay Method 
(VDM) — that makes use of the Voronoi partition and 
Delaunay triangulation of a galaxy redshift survey to 
identify high-density regions. By performing a targeted, 
adaptive search in these regions, the VDM avoids many 
of the pitfalls of simple percolation methods; we will use 
a version of it in this study. We note in passing, how- 
ever, that the matched-filter algorithm is also attractive 
for DEEP2, and we plan to explore its usefulness in fu- 
ture studies. 
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Fig. 5. — The two-dimensional Voronoi partition (Dirichlet tesse- 
lation) and Delaunay mesh for an array of points. The points con- 
sist of a randomly generated uniform background (triangles) and a 
small, tightly clustered group of points (squares) that roughly ap- 
proximate a galaxy group. Dotted lines show the Delaunay mesh, 
which connects each point to its nearest neighbors. The solid lines 
delineate the edges of the Voronoi polygons — the perpendicular bi- 
sectors of the Delaunay links. Note that each polygon contains 
only one point, and that the typical Voronoi cell is smaller for the 
grouped points than for the background points. 



4.2. The Voronoi- Delaunay method 

iMarinoni et alJ l|2002j) showed that the VDM success- 
fully reproduces the distribution of groups in velocity dis- 
persion in a DEEP2-like sample down to some minimum 
dispersion a c . The algorithm makes use of the three- 
dimensional Voronoi partition of the galaxy redshift cata- 
log, which tiles space with a set of unique polyhedral sub- 
volumes, each of which contains exactly one galaxy and 
all points closer to that galaxy than to any other. The 
Voronoi partition naturally provides information about 
the clustering properties of galaxies, since galaxies with 
many neighbors will have small Voronoi volumes, while 
relatively isolated galaxies will have large Voronoi vol- 
umes. The algorithm also makes use of the clustering 
information encoded in the Delaunay mesh, which is a 
complex of line segments linking neighboring galaxies. 
Mathematically speaking, the Delaunay mesh is the ge- 
ometrical dual of the Voronoi partition; the faces of the 
Voronoi cells are the perpendicular bisectors of the lines 
in the Delaunay mesh. A two-dimensional visual repre- 
sentation of the Voronoi partition and Delaunay mesh is 
shown in Figure [3] 

The VDM group-finding algorithm proceeds iteratively 
through the galaxy catalog in three phases as follows. In 
Phase I, all galaxies that have not yet been assigned to 
groups are sorted in ascending order of Voronoi volume. 
This is mainly a time-saving step, because it allows us 
to begin our group search with those galaxies in dense 
regions. Then, for the first galaxy in this sorted list (the 
seed galaxy), we define a relatively small cylinder of ra- 



dius 7?. m i n and length 2£ m ; n , oriented with its axis along 
the redshift direction. The dimensions of this and other 
search cylinders are computed using comoving coordi- 
nates. 9 Within this cylinder, we find all galaxies that 
are connected to the seed galaxy by the Delaunay mesh 
(the first- order Delaunay neighbors). If there are no such 
galaxies, the seed galaxy is said to be isolated, and the al- 
gorithm moves on to the next seed galaxy in the list. By 
initially searching in a small cylinder, we are able to limit 
the probability of chance associations being misidentified 
as groups. 

If, however, there are one or more first-order Delau- 
nay neighbors, we move on to Phase II. We define a 
second, larger cylinder, concentric with the first one, 
with radius 1Zn and length 2Cn. Within this cylinder, 
we identify all galaxies that are connected to the seed 
galaxy or to its first-order Delaunay neighbors by the 
Delaunay mesh. These are the second-order Delaunay 
neighbors. The seed galaxy and its first- and second- 
order Delaunay neighbors constitute a set of Nji galax- 
ies; we take Nji to be an estimate of the central rich- 
ness of the group. Scaling relations are known to exist 
between group mass and radius and velocity dispersion 
ijBrvan &: Normar] 11998(1 . and between velocity dipser- 
sion and central richness l|Bahcal]|ll981|) . Thus we may 
estimate the final size of the group from iV/j. In partic- 
ular, we expect that Nji oc M cx cr 3 oc R 3 . 

Therefore in Phase III we define a third cylinder, cen- 
tered on the center of mass of the Njj galaxies from 
Phase II, with radius and half-length given by 

ft«i=r(JVjf )* (3) 
C nl =£(N^ rr )^, 

where r and £ are free parameters that must be op- 
timized. Here, the corrected central richness Nf1 rr is 
scaled to account for the redshift-dependent number den- 
sity v(z) of galaxies in a magnitude-limited survey: 

w " r = {wk) (4) 

We compute {v(z)) by smoothing the redshift distribu- 
tion of the entire galaxy sample and dividing it by the 
differential comoving volume element dV/dz to yield the 
comoving number density. All galaxies within the Phase 
III cylinder (and any of the Njj galaxies from Phase II 
that happen to fall outside of it) are taken to be mem- 
bers of the group. After a group has been identified, this 
three-phase process repeated on all remaining galaxies 
that have not yet been assigned to groups until all galax- 
ies have either been placed into groups or explicitly iden- 
tified as isolated galaxies. 

The astute reader may object here that we have used 
the central richness Nji to scale our search window, even 
though we found earlier that richness is a relatively un- 
stable group property within the DEEP2 sample. This 
is true. However, the groups in our mock catalogs do 

9 One might naively expect to use physical coordinates to find 
virialized objects like clusters, but because the background density 
scales as p^ oc (1 + z) 3 , dark matter halos of a given mass have 
virial radii that scale roughly as R v i r oc (1 + z) — 1 . Hence, clusters 
of fixed mass have radii that are roughly constant in comoving 
coordinates. 
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show some correlation between actual and observed rich- 
ness, even though the scatter is very large. Furthermore, 
we have mitigated one major source of error, Malmquist 
bias, with the correction in equation^] Since the depen- 
dence of our scaling on Nn is relatively weak, we antici- 
pate that errors in estimating this quantity will not intro- 
duce insurmountably large errors into our group sample. 
This expectation is borne out by tests on mock catalogs, 
as will be seen in Section fQl 

Finally, it is important to note some minor differ- 
ences between our gro up-finder and the one described in 
iMarinoni et alJ ((2002) . In that paper, the scaling factors 
r and v in Equation were derived iteratively by run- 
ning the group-finder first with a best-guess parameter 
set and then automatically adjusting parameters accord- 
ing to the largest groups found. In tests on mock cata- 
logs, we found this method to be unstable, so we instead 
choose to optimize our parameters empirically with mock 
catalogs and then leave them fixed. Also, when we search 
for groups in cylinders, it is important to note that the 
"length" of our cylinders is supposed to correspond to an 
expected maximum velocity of the galaxies in the group. 
Since the mapping between redshift interval and pecu- 
liar velocity changes with redshift, we must rescale the 
length of our search cylinders as C(z) — [s{z) / s{zq)]Zq, 
where the scaling factor s(z) is given by 



y/fl M (l + z) 3 + Qx 

for the standard ACDM cosmology. This scaling 
amounts to a ~ 10% effect over the redshift range of 
the DEEP2 survey. We apply it to the cylinder in each 
phase, taking a reference redshift of zq = 0.7. 

4.3. Optimizing with mock catalogs 

To gauge the success of our group-finding algorithm, 
we will make use of the mock DEEP2 sample described in 
Section l2~2l Each galaxy in these catalogs is tagged with 
the name of its parent halo, making the identification of 
real groups a simple matching exercise. Thus, we have a 
catalog of real groups, identified from N-body models in 
real space, against which we can compare the results of 
applying the VDM algorithm to the mock galaxy catalog 
projected in redshift space. To compute completeness 
and purity, we simply apply equation J5| to the real and 
reconstructed group catalogs. 

As a rough measure of the accuracy of our recon- 
structed distribution, nf OU nd(c, z), we apply a two- 
dimensional Kolmogorov-Smirnov (K-S) test to this dis- 
tribution and the real distribution, n rca i(er, z), to de- 
termine whether they are statistically distinguishable. 
IMarinoni et all l)2002j) found that the VDM group-finder 
should accurately reproduce this distribution above a c fts 
400 km s -1 , so we apply the K-S test only above this 
velocity dispersion. The test is insensitive to the total 
number of groups in each sample, so we must indepen- 
dently ensure that the two distributions have the same 
normalization. To do this, we simply count the total 
number of groups with a > 400 km s -1 . We want to 
ensure that the real and reconstructed normalizations 
match to better than the expected cosmic variance for 
our sample (about 12% for the abundance of groups with 
a > 400 km s _1 ), so that our final errors are dominated 



TABLE 2 
Parameters used for group finding 
with the vdm algorithm in this 

STUDY. 



Parameter a 


Optimal 


High-purity 




0.3 


0.1 


£"min 


7.8 


5.0 


n n 


0.5 


0.3 


c n 


6.0 


5.0 


r 


0.35 


0.25 


e 


14 


14 



a All values are given in comoving 
h- 1 Mpc. 



by cosmic variance. Guided by simple physical consid- 
erations {e.g., the expected velocity dispersion range of 
groups and clusters), we explore the space of VDM pa- 
rameters lZ mlu , £ m i n ,lZu, £u,r and £, using trial and 
error to narrow our parameter range down to a range 
that produces an nf oun d(a, z) that is statistically indistin- 
guishable from n Iea i(a,z) (less than 1% confidence that 
the two distributions arc different) and properly normal- 
ized. At the same time, we monitor the completeness 
and purity of the reconstructed group catalog requiring, 
minimally, that C2 and Pi remain above 50% and at- 
tempting to increase them as much as possible. 

The procedure described above is simple to implement 
and perform; however, it is ultimately an insufficient test 
of our success. It asks only whether or not nf OU nd(c, z) 
and n rea i(<7, z) appear, in a statistical sense, to have been 
drawn from the same distribution. But we want to know 
whether or not nfo U nd(c, z) is an accurate reconstruction 
of Ureal(C) %)'■> for the two distributions to pass a K-S test 
is a necessary but not a sufficient condition. In order 
to fully optimize our parameters, we must aim to reduce 
any systematic error in rif oun d — "real to a level below the 
cosmic variance. 

Thus, we will want to assess our error in reconstruct- 
ing the acutal velocity function 

"real 

(a, z) in a given 

field, irrespective of cosmic variance, and then com- 
pare our reconstruction error to the expected cosmic 
variance in that field. As long as the systematic er- 
ror is smaller than the cosmic variance, it will not be 
a significant source of error in our measurement of the 
velocity function. To estimate our systematic error, 
we apply the VDM group finder to twelve independent 
DEEP2 fields and compute the mean fractional residu- 
als (5 n ) = ((rifound - Ureal) /nfouad): which constitute a 
measurment of the fractional systematic reconstruction 
error. The uncertainty in determining (5 n ) is then given 
by the standard error in this quantity, o~($) . We have 
used fractional errors here, rather than absolute errors, 
to distinguish errors in reconstruction from the intrinsic 
scatter (cosmic variance) in n re al an d "found- We can 
then measure the fractional cosmic variance (plus Pois- 

son noise) a cos = ({n 2 e?1 )/(n rea i} 2 - 1) 1/2 from the mock 
catalogs and compare it to the systematic error (d n ). 

For simplicity of presentation, we first consider the in- 
tegrated one-dimensional distributions n(o~) and n(z). 
Figure shows the fractional systematic errors (5 n ) in 
these distributions, and error bars show the uncertainty 
o~/g\ in determining this quantity. These two quantities 
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Fig. 6. — Fractional errors in measuring n(<r) and n(z). Upper 
panel: the data points show the fractional systematic error (5 n ) as 
a function of velocity dispersion, estimated by running the VDM al- 
gorithm on twelve independent mock DEEP2 pointings. Error bars 



show the standard deviation of the mean cr 



while the shaded 



region (light blue in the electronic edition) shows the fractional 
cosmic variance (plus Poisson noise) a cos for a single (120 X 30 
arcmin) DEEP2 field, in bins of 50 km s — 1 . For a > 350 km s" 1 , 
the systematic errors are dominated by cosmic variance. Bottom 
panel: Fractional errors in nf oun( j and fractional cosmic variance, 
as a function of redshift in bins of 0.05 in z, after groups with 
a < 350 fans -1 have been discarded. Any systematic offsets are 
smaller than the cosmic variance. 



are measured by applying the VDM group-finder to the 
DEEP2 mock catalogs using the "optimal" VDM param- 
eter set shown in Tabled Also shown is the fractional 
cosmic variance (plus Poisson noise) <j cos expected in a 
single (120 x 30 arcmin) DEEP2 field. As shown in the 
upper panel of the figure, systematic reconstruction er- 
rors in nfound(c) are dominated by cosmic variance for 
a > 350 kms^ 1 , while we significantly overestimate 
the abundance of lower-dispersion groups. If we dis- 
card the reconstructed groups with a < 350 kms -1 , 
systematic errors in the rifound(z) distribution are also 
smaller than the cosmic variance, as shown in the bot- 
tom panel of the figure. We note that we have been able 
to do somewhat better than expected, reconstructing the 
velocity function accurately down to a = 350 km s -1 , 
slightl y lower than t he cu toff of 400 km s^ 1 expected 
from lMarinoni et all l|2002) . 

The fact that these one-dimensional distributions are 
accurately measured to within the cosmic variance is 
heartening, but to fully optimize our group-finder we 
must ensure that the full two-dimensional distribution 
n(cr, z) is accurately measured. Figure[3shows smoothed 
contour plots of the mean fractional systematic errors 
(5 n (a, z)) in this distribution, the uncertainty c/g)) m 
this quantity, the fractional cosmic variance a cos , and 
the ratio of the systematic error to the cosmic variance. 
The panel at the lower right shows that significant, corre- 
lated overestimates of the distribution are confined to low 



velocity dispersion. For a > 350 km s -1 , on the other 
hand, the errors are smaller than the cosmic variance 
and exhibit no systematic, large-scale bin-to-bin correla- 
tions. 

To be somewhat more quantitative, we note that 
for a > 350 km s -1 , the average value of the ratio 
{{8n) I &cos) 2 shown in the lower right panel of Figure [7] 
(before smoothing) is 0.1, and the maximum value is 
0.9. Thus we may proceed with confidence that n(er, z) 
for DEEP2 is reconstructed with sufficient accuracy by 
the VDM group-finder for velocity dispersions above 
350 km s _1 . Therefore, having achieved our optimization 
goals, we may apply the VDM algorithm to the DEEP2 
redshift catalog with confidence that our reconstructed 
catalog will produce an accurate and unbiased measure- 
ment of n(a, z) for a > 350 km s _1 (as long as our mock 
catalogs are a reasonable representation of the real uni- 
verse). We note that this minimum velocity dispersion 
is not a limitation of the VDM group-finder — a more 
densely sampled survey would permit n(o ~ : z) to be recon- 
struc ted down to even lower dispersions l|Marinoni et alJ 
2002). More generally, it is important to recognize that 
the conclusions reached here apply only to the DEEP2 
survey: a survey probing significantly greater volume, for 
example, would have smaller cosmic variance, perhaps 
necessitating a more accurate reconstruction of n(cr, z) 
than has been presented here. 

After running the VDM on the 12 mock samples us- 
ing the optimal parameter set in Table we obtain 
mean completeness parameters of G\ = 0.782 ± 0.006 
and C2 = 0.719 ± 0.005 and mean purity parameters of 
Pi = 0.545 ± 0.005 and P 2 = 0.538 ± 0.005, with the 
quoted uncertainties indicating the standard deviations 
of the means. As shown in Figure |H1 these statistics 
are nearly independent of the velocity dispersion of the 
groups being considered. The fact that C\ — C 2 and 
Pi — P2 are small suggests that our catalogs are largely 
free of fragmentation or over-merging. We also find that 
most galaxies that belong to real groups are identified as 
group members: the mocks yield a mean galaxy-success 
rate of Seal — 0.786± 0.006. Conversely, the mean inter- 
loper fraction is // = 0.458 ± 0.004, indicating that the 
galaxies in our reconstructed group catalogs are domi- 
nantly real group members. 

Since the purity is relatively low, it will be difficult to 
know whether to believe in the reality of any individual 
group in our optimal group sample, although the prop- 
erties of the catalog as a whole are accurately measured. 
To give some sense of the errors encountered in recon- 
structing individual groups, we show several examples of 
group-finding success and failure in FigureEl In order to 
produce a catalog that may be believed with more con- 
fidence on a group-by-group basis, we can optimize the 
VDM parameters to maximize the purity (contingent on 
the requirement that we still find an appreciable num- 
ber of groups). The high-purity parameter set shown 
in Table El gives mean purity measures in the mock cat- 
alogs of Pi = 0.825 ± 0.007 and P 2 = 0.815 ± 0.006. 
The completeness measures are necessarily much lower 
for this parameter set, however: G\ = 0.284 ±0.008 and 
C 2 = 0.277 ±0.008. 
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We have applied the VDM group finding algorithm to 
galaxies in the three most completely observed DEEP2 
pointings using the optimal parameters from Table [21 
Figures ITUI and ITT1 show groups found in pointing 32 (see 
Tabled, both as seen on the sky, and as seen along the 
line of sight, projecting through the shortest dimension 
of the field. Especially notable in these diagrams is the 
clear visual confirmation that groups are strongly biased 
tracers of the underlying dark matter distribution. The 
groups we find clearly populate dense regions and fila- 
ments preferentially in these figures. Close-up views of a 
few of the larger groups from this pointing can be seen 
in Figure IT21 

Our optimal VDM group-finder identifies a total of 899 
groups with N > 2 in the three fields considered here, 
with 32% of all galaxies in the sample being placed into 
groups. We note that this percentage is much lower than 



that found in the 2dFGRS hv lEke et al .1 ll200l (55%); 
however, our observational selection criteria and group- 
finding methods are sufficiently different from theirs that 
detailed comparisons will be quite difficult. By com- 
paring the volume of the initial search cylinder used in 
Phase I of the VDM group-finder to the number den- 
sity of DEEP2 galaxies in the range 0.7 < z < 0.8, we 
estimate that our groups have a minimum central over- 
density (in redshift space) of 8v/v > 100. 

In Table |3 we present the locations and properties of 
the subset of groups with a > 350 km s _1 (153 groups). 
We also have found groups in the same data using the 
high-purity parameter set in Tabled We can match our 
two group catalogs by identifying those groups in the 
optimal catalog that are the Largest Associated Groups 
of the groups in the high-purity catalog. Such groups 
are noted as "strong" detections in Table [21 they are 
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highly likely (> 80% chance) to be associated with real 
virialized structures. Such strong detections constitute 
17% of the total group sample and 13% of the sample 
with a > 350 km s . 

Throughout this study, we have focused on recon- 
structing a group catalog that provides an accurate mea- 
sure of the velocity function n(a). The ultimate goal of 
measuring cosmological parameters must wait for more 
data, but it is interesting at this stage to compare the 
DEEP2 data to the predictions from mock catalogs. Fig- 
ure 1131 compares the measured velocity function n(a) 
(data points) to the true velocity function ntrue(c) (solid 
line) predicted by the mock DEEP2 sample described in 
Section 12.21 The measured velocity functions are qual- 
itatively consistent with the prediction from the mock 
catalogs for a > 350 km s , while the measurements are 
significantly higher than the prediction for lower velocity 
dispersions. In Section PI we showed that an accurate 
reconstruction of the velocity function is expected in the 
higher-dispersion regime, while an overestimate of n(a) 
is expected at lower velocity dispersions. We intend to 
exclude low-dispersion groups from future analyses, so 
we do not consider the high measured values for n(a) a 
particular cause for concern. 

However, it is important to note that this 
comparison — of the measured velocity function to 
the "true" velocity function in the mock catalogs — is 
not, strictly speaking, the appropriate comparison to 
make to assess the similarity of the mocks and the 
data. A real "apples-to- apples" comparison would 
compare the measured n(a) to the mean reconstructed 
velocity function (ri{ oun d(a)} derived by applying the 
VDM group-finder to all twelve mocks. This quantity 



is indicated by the dashed line in Figure 13; the data 
appear to be consistent with it at velocity dispersions 
o-> 300kms -1 . 

Below this threshold a slight discrepancy remains: the 
data points lie significantly above the dashed line at low 
dispersions. However, one would naively expect the ve- 
locity function reconstructed from the data to be every- 
where consistent with the one reconstructed in the mock 
catalogs, if the mocks are a good simulation of the data. 
It is difficult to assess the significance of the discrepancy 
shown here, since we have not optimized our group-finder 
to measure the abundance of such low-dispersion groups, 
but it appears that the the mock catalogs may be incon- 
sistent with the DEEP2 data on small velocity scales. 
Nevertheless, it is clear that the mocks are consistent 
with the data in the high-dispersion regime, so this com- 
parison confirms that the mock catalogs are an accurate 
simulation of DEEP2 data for our purposes. The more 
scientifically interesting "data-to-prediction" comparison 
shown by the solid line in Figure IT31 then stands as ev- 
idence that an accurate reconstruction of the velocity 
function for a > 350 km s -1 is possible in DEEP2, pro- 
viding the first step necessary to placing constraints on 
cosmological parameters. 

6. DISCUSSION AND CONCLUSIONS 

We have optimized the VDM group-finding algorithm 
using mock catalogs designed to replicate the DEEP2 
survey, and we have applied it to spectroscopic data from 
three DEEP2 photometric pointings. In the process of 
optimization, we have defined a measure of group find- 
ing success that focuses on accurately reproducing the 
overall properties of the group catalog — in particular the 
distribution of groups in redshift and velocity dispersion, 
n(a, z) — while paying some attention also to the accurate 
reconstruction of individual groups. Tests on DEEP2 
mock catalogs show that we are able to accurately repro- 
duce n((T, z) for a > 350 km s -1 and that errors in mea- 
suring this quantity in DEEP2 should be smaller than its 
expected intrinsic cosmic variance. It should thus b e pos- 
sible to use the test described bv iNewman et all (2002) 
to constrain cosmological parameters, including the dark 
energy equation of state parameter, w. 

We find 899 groups with two or more members within 
the DEEP2 data considered in this study, roughly 25% 
of the expected final sample. Of these, 153 have velocity 
dispersions a > 350 kms . The distribution of these 
reconstructed groups with velocity dispersion n(a) is in 
good agreement with the distribution for real groups in 
DEEP2 mock catalogs. This result provides a useful con- 
sistency check for the mock catalogs: assuming our re- 
constructed n(o~) is accurate (as our tests show that it 
is for a > 350 kms -1 ), we may be confident that the 
properties of groups in the mock catalogs are an accurate 
simulation of real DEEP2 data. This is especially impor- 
tant in the context of the so-called velocity bias, which 
is the ratio of the velocity dispersion of galaxies to that 
of the underlying dark matter halo, b v = qy a ,i /trnivi ■ Var- 
ious s tudies of N-body simulations (e.g., iDiemand et alJ 
12004 and references therein) have suggested that b v ^ 1 
at the 15-30% level, but no such effects have been in- 
cluded in the DEEP2 mock catalogs. Our results on n(a) 
thus indicate that our data are consistent with b v — 1 
within the measurement errors shown in Figure IT3l This 
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Fig. 9. — Examples of group-finding success and failure in the DEEP2 mock catalogs. In each panel, squares indicate galaxies in the 
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is no surprise, since our current error bars are consid- 
erably larger than the expected effect; nevertheless, this 
result may be viewed as confirmation that no stronger 
biases exist. 

Successlul detection of groups and clusters within the 
DEEP2 redshift survey is an essential first step for a wide 
variety of planned studies. By comparing the properties 
of galaxies in groups to the properties of isolated galaxies 
at high redshift, we can learn much about galaxy forma- 
tion and evolutio n. This will be discus sed further in an 
upcoming paper (Ger ke et alJlin Drep.1 ). Also, with the 
catalog of groups we now have in hand, it is possible to 
pursue targeted follow-up observations in X-rays or us- 
ing the Sunyaev-Zeldovich effect to better constrain the 
gas physics of groups at high redshift; such programs are 



now being developed, including an upcoming X-ray sur- 
vey of the extended Groth Strip field with the Chandra 
space telescope. Finally, using the groups we find in our 
current and future spectroscopic data, we expect to put 
strong new constraints on the formation and evolution 
of galaxies, groups, and clusters, and to investigate the 
makeup of the universe, including the nature of the dark 
energy. 
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Fig. 12. — Close-up views of four DEEP2 groups. Shaded squares (colored in the electronic edition) indicate galaxies in the group being 
considered, shaded (colored) triangles indicate galaxies in nearby groups, and black crosses indicate nearby field galaxies. Each group is 
shown in three projections: one as seen on the sky and two along the line of sight. 



TABLE 3 

Locations and properties of groups with a > 350 km s _1 . 



RA a - b dec a,b z h a c N Strong d 
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Fig. 13. — Comparison of the velocity function n(a) measured 
in a three DEEP2 pointings (see Table Q to that predicted by 
mock catalogs. The data points are the measured velocity func- 
tion, in bins of 50 km s — 1 , of groups found in the three pointings. 
Error bars are estimated by applying the VDM group finder to 
twelve independent mock DEEP2 pointings, measuring n(a), and 
taking the standard deviation of the fractional residuals <5 n to es- 
timate a fractional error in each bin. For clarity, error bars are 
only shown for one data point in each bin. The solid line is the 
average "tr ue" v elocity function from the mock DEEP2 catalogs 
(see Section l2.2l . (nt rue (o-)), in bins of 50 km s —1 , and the shaded 
region (light blue in the electronic edition) indicates the combined 
cosmic variance and Poisson noise in each bin, for a single DEEP2 
pointing. The dashed line shows the average reconstructed velocity 
function, {ri{ ouu ^(a)) . The measured velocity function is consistent 
with the mock catalogs in the regime a > 350 km s _1 (demarcated 
by the dotted line), where an accurate measurement is expected. 
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RA a < b dec a - b z b cr c N Strong d 



16 51 58 
16 49 52 
16 52 58 
16 52 25 
16 49 48 
16 49 54 
16 49 48 
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23 29 14 
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23 30 35 
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23 29 14 
23 28 51 
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TABLE 3 — Continued 
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a Positions on the sky are given in J2000 sexagesimal coordinates. 
b Median value of all galaxies in the group. 
c Given in km s -1 . 

d Groups detected in both the standard and high-purity group catalogs are indicated as strong detections. 



